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DETAILED ACTION 

1 . This communication is in response to the Application filed on 8/25/2003. 

Drawings 

2. The drawings are objected to because Figure 2 element 240 states "Test to 
Speech Engine". Corrected drawing sheets in compliance with 37 CFR 1.121(d) are 
required in reply to the Office action to avoid abandonment of the application. Any 
amended replacement drawing sheet should include all of the figures appearing on the 
immediate prior version of the sheet, even if only one figure is being amended. The 
figure or figure number of an amended drawing should not be labeled as "amended." If 
a drawing figure is to be canceled, the appropriate figure must be removed from the 
replacement sheet, and where necessary, the remaining figures must be renumbered 
and appropriate changes made to the brief description of the several views of the 
drawings for consistency. Additional replacement sheets may be necessary to show the 
renumbering of the remaining figures. Each drawing sheet submitted after the filing date 
of an application must be labeled in the top margin as either "Replacement Sheet" or 
"New Sheet" pursuant to 37 CFR 1.121(d). If the changes are not accepted by the 
examiner, the applicant will be notified and informed of any required corrective action in 
the next Office action. The objection to the drawings will not be held in abeyance. 
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Claim Objections 

3. Claim 3 objected to because of the following informalities: "phase" in line 3 
should be "phrase". Appropriate correction is required. 

4. Claim 4 is objected to because of the following informalities: "a confidence" in 
line 2 should be "the confidence level". Appropriate correction is required. 

5. Claim 16 is objected to because of the following informalities: "a grammar sub- 
tree" in line 3 should be "the grammar sub-tree". Appropriate correction is required. 

6. Claim 22 is objected to because of the following informalities: "a speech 
recognition engine" in line 3 should be "the speech recognition engine". Appropriate 
correction is required. 

Claim Rejections - 35 USC § 103 

7. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

8. The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 
USPQ 459 (1966), that are applied for establishing a background for determining 
obviousness under 35 U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating 
obviousness or nonobviousness. 



Application/Control Number: 10/647,709 



Art Unit: 2609 



Page 4 



9. Claims 1-6, 8-13,17-19, and 22-24 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Kennewick et al. (US PGPub 2004/0044516) in view of Crepy et al. 
(US 6,622,121). 

As to claims 1 and 1 8, Kennewick et al. discloses the improvement of speech 
recognition engine, comprising: identifying one or more utterances (see page 1 , right 
column, [0010], lines 3-4) for recognition by a speech recognition engine (see Figure 1, 
element 120 and page 6, right column, [0088], line 4); passing the one or more 
identified utterances to a text-to-speech module (see page 1, right column, [0012], lines 
8-9 and Figure 1, element 124); analyzing each utterance (see page 14, left column, 
[0188], line 8) to determine how close each recognized utterance approximates to the 
audio pronunciation from each utterance derived (see page 14, left column, [0188], lines 
8-10). However, Kennewick et al. does not specifically disclose the passing of the audio 
pronunciation of the identified utterance to the speech recognition engine and creating 
an utterance for each audio pronunciation that was passed. Crepy et al. discloses the 
passing of the audio pronunciation of each of the utterances to the speech recognition 
engine (see col. 1 , line 65); creating an utterance for each audio pronunciation passed 
to the speech recognition engine (see col. 1, line 66-67). It would have been obvious to 
one of ordinary skilled in the art to have modified the identification of utterances and the 
analysis of the utterance presented by Kennewick et al. with the utilization of the output 
of the text to speech module into the speech recognition engine as presented by Crepy 
et al. The motivation to combine these two references involves testing the recognition of . 
a spoken input (see Crepy et al., Abstract). 
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As to claim 2, Kennewick et al. discloses assigning a confidence score to each 
utterance (see Figure 5, element 506). 

As to claims 3 and 19, Kennewick et al. discloses the assigning of confidence 
score to each recognized utterance based on a confidence level associated with the 
utterance based on prior speech recognition engine training (see page 10, left column, 
[0151], line 4-5). 

As to claims 4 and 10, Kennewick et al. discloses the determination being made 
of whether the recognized utterance is the same as the utterance derived by the speech 
recognition engine based on prior speech recognition training (see page 10, right 
column, [0151], lines 4-8) (e.g. It should be noted that there is a dictionary that is used 
to see whether the recognized utterance matches). It is inherent that the words from the 
dictionary and the words from the utterance are matched for similarity. 

As to claims 5 and 1 1 , Kennewick et al. discloses wherein if the confidence score 
exceeds an acceptable level designating the recognized utterance as accurately 
recognized by the speech recognition engine (see page 14, left column, [0188], lines 
30-33). 

As to claim 6, Kennewick et al. discloses wherein if the confidence score less 
than a certain value, a modification is made to the speech recognition engine to 
recognize the word (see page 14, left column, [0031]) (e.g. If the confidence level is less 
than a value, the system requests verification from a user or asks a question to remove 
any ambiguity. This is seen as a modification to the speech recognition engine to 
interpret the utterance). 
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As to claim 8, Kennewick et al. discloses whereby modifying the speech 
recognition engine includes altering the pronounced utterance associated with the 
confidence score that is less than a threshold value (see page 10, right column, [0153], 
lines 16-21) (e.g. The additional information obtained from the user removes the 
ambiguity in the uttered word due to low confidence. Thus, the uttered word is altered to 
be recognized) such that the altered audio pronunciation obtains an acceptable 
confidence score upon next pass (see page 10, right column, [0154], lines 17-19) (e.g. 
The confidence scores are updated as the system learns more information). 

As to claim 9, Kennewick et al. discloses the reduction of the confidence score 
threshold level (see page 10, right column, [0154], lines 17-19). It is inherent that the 
constant update and learning of the system presented in the reference would alter the 
confidence score threshold as it would alter the confidence level of the word. 

As to claim 12, Kennewick et al discloses the loading of utterances when 
identifying one or more utterances to be recognized (see Figure 1 , element 112 and 
page 5, right column, [0081], line 1) (e.g. It is inherent to include a memory device for 
storing the dictionary containing the utterances to be recognized. Further, the reference 
indicates its relevance to computing devices). 

As to claim 13, Kennewick et al. discloses the extracting of one or more 
utterance via a dictionary unit (see page 10, left column, [0147], lines 2-4) (e.g. It should 
be noted that extraction is done by using the information from the dictionary). 

As to claim 17, Crepy et al. discloses the conversion of the audio pronunciation 
from audio format to a digital format (see col. 4, line 57-58) (e.g. The reference states 
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that the conversion is done after text to speech synthesis. The conversion from audio to 
digital before the signal passes into the speech recognition engine or by the speech 
recognition engine (which is done before the recognition process) will have no effect on 
the result (utterance recognition)); and analyzing phonetically the audio pronunciation of 
the utterance to create the recognized word (see col. 4, line 59-64) (e.g. it should be 
noted that in order to compare the results from storage to that uttered, comparisons are 
done between the two. This would involve comparing the phonemes of the uttered word 
and the stored word). 

As to claim 22, Kennewick et al. discloses the improvement of speech 
recognition engine, comprising: identifying one or more utterances (see page 1, right 
column, [0010], lines 3-4) for recognition by a speech recognition engine (see Figure, 
element 120 and page 6, right column, [0088], line 4); passing the one or more 
identified utterances to a text-to-speech module (see page 1, right column, [0012], lines 
8-9 and Figure 1, element 124); analyzing each utterance (see page 14, left column, 
[0188], line 8) to determine how close each recognized utterance approximates to the 
audio pronunciation from each utterance derived (see page 14, left column, [0188], lines 
8-10); assigning a confidence score to each recognized utterance based on speech 
recognition engine's confidence in reach recognized utterance based on prior training of 
the speech recognition engine to recognize similar words (see page 10, right column, 
[0151], lines 4-8) (e.g. It should be noted that there is a dictionary that is used to see 
whether the recognized utterance matches); if the confidence score is less than an 
acceptable threshold, modifying the speech recognition engine to recognize the 
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utterance (see page 14, left column, [0031]) (e.g. If the confidence level is less than a 
value, the system requests verification from a user or asks a question to remove any 
ambiguity. This is seen as a modification to the speech recognition engine to interpret 
the utterance). However, Kennewick et al. does not specifically disclose the deriving 
and passing of the audio pronunciation of the identified utterance to the speech 
recognition engine and creating an utterance for each audio pronunciation that was 
passed. Crepy etal. discloses the passing of the audio pronunciation of each of the 
utterances to the speech recognition engine (see col. 1, line 65); creating an utterance 
for each audio pronunciation passed to the speech recognition engine (see col. 1 , line 
66-67). It would have been obvious to one of ordinary skilled in the art to have modified 
the identification of utterances and the analysis of the utterance presented by 
Kennewick et al. with the utilization of the output of the text to speech module into the 
speech recognition engine as presented by Crepy et al. The motivation to combine 
these two references involves testing the recognition of a spoken input (see Crepy et 
a/., Abstract). 

As to claim 23, Kennewick et al. discloses whereby modifying the speech 
recognition engine includes altering the pronounced utterance associated with the 
confidence score that is less than a threshold value (see page 10, right column, [0153], 
lines 16-21) (e.g. The additional information obtained from the user removes the 
ambiguity in the uttered word due to low confidence. Thus, the uttered word is altered to 
be recognized) such that the altered audio pronunciation obtains an acceptable 
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confidence score upon next pass (see page 10, right column, [0154], lines 17-19) (e.g. 
The confidence scores are updated as the system learns more information). 

As to claim 24, Kennewick etal. discloses the reduction of the confidence score 
threshold level ((see page 10, right column, [0154], lines 17-19). It is inherent that the 
constant update and learning of the system presented in the reference would alter the 
confidence score threshold as it would alter the confidence level of the word. 
10. Claims 7 and 20-21 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Kennewick et a/, and Crepy et a/, as applied to claim 12 above, and further in view 
of Knott etal. (US PGPub 2003/0191648) as applied to claim 5 and 19 above, and 
further in view of Bickley et ai (US 7,013,276). 

As to claims 7 and 20, Kennewick et ai and Crepy et al. disclose improving the 
performance of a speech recognition engine. However, Kennewick et al. and Crepy et 
al. do not specifically disclose the notification to a developer when the score is lower 
than a threshold value. Bickley et al. discloses a alert mechanism for words that are 
similar and are subject to confusion (see col. 10, lines 63-65) from threshold calculation 
(see col. 10, lines 38-40). It would have been obvious to one of ordinary skilled in the art 
to modify the speech recognition performance methods presented by Kennewick et al. 
and Crepy et al. by the use of a notification sent to a software developer when value is 
below threshold. The motivation to combine the two references involves the 
distinguishing between similar words, which may not be recognized by speech 
recognition engines (see Bickley et al. col. 2, line 27-36). 
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As to claim 21 , Kennewick et al. discloses the extracting of one or more 
utterance via a dictionary unit (see page 6, right column, [0088], lines 4-6) (e.g. It should 
be noted that extraction is done by using the information from the dictionary); to pass 
each extracted utterance to the text-to-speech converter (see page 6, right column, 
[0089], lines 1-7). 

11. Claims 14-16 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kennewick et al. and Crepy et al. as applied to claim 12 above, and further in view of 
Knott et al. (US PGPub 2003/0191648). 

As to claim 14, Kennewick et al. and Crepy et al. disclose improving the 
performance of a speech recognition engine. However, Kennewick et al. and Crepy et 
al. do not specifically disclose the categorizing of utterance by grammar type. Knott ef 
al. discloses the categorizing of the utterance by answer type and the grouping of the 
answers to either indicate affirmative or a refutation (see page 3, right column, [0021], 
lines 5-7) (e.g. The following categorizing of affirmations and refutation is similar to what 
the applicant is interpreting grammar type to be). It would have been obvious to one of 
ordinary skilled in the art to modify the speech recognition performance methods 
presented by Kennewick et al. and Crepy et al. by the use of categories for words as 
shown by Knott et al. The motivation to combine the two references involves various 
answers given by users (see Knott et al., page 1 , left column, [0003], lines 1-4). 

As to claim 15, Knott et al. discloses the inclusion of the subcategories as a 
group containing all utterances (see page 3, right column, [0021], lines 5-7)) (e.g. The 
glossary contains both the refutations and affirmations). 
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As to claim 16, Knott et al. discloses the identifying of an utterance for 
recognition by identifying the category for which the spoken word belongs (see page 3, 
right column, [0021], lines 5-13) (e.g. It is inherent that depending when finding the 
confidence score the value the correct category for which the spoken word is associated 
with is identified). 

Conclusion 

12. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

The US 6,078,885 is cited to teach a revise phonetic transcriptions of words in a 
dictionary. The US 6,1 19,085 is cited to teach a method for finding differences in 
pronunciations between a vocabulary and a text to speech engine. The US 7,006,971 is 
cited to teach a method recognizing speech utterance by letter sequence. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Paras Shah whose telephone number is (571)270-1650. 
The examiner can normally be reached on MON.-FRI. 7:30a. m.-5:00p.m. EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Xiao Wu can be reached on (571 )272-7761 . The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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